Update to v2.0.0-alpha.1 by xylar · Pull Request #944 · MPAS-Dev/compass

xylar · 2026-03-21T11:49:32Z

This pull request updates to mache.deploy, which uses the ./deploy.py script instead of ./conda/configure-compass-env.py.

It switches to using pixi in the background for creating environments with conda packages.

Updates:

esmf v8.9.1
mache v3.3.0 -- brings in mache.deploy, mache.jigsaw and mache.parallel as well as module updates on many machines and several bug fixes
moab v5.6.0
albany tag compass-2026-03-21
trilinos tag compass-2026-02-06

Testing

Only testing MALI, as MPAS-Ocean is no longer being tested regularly on Compass.

MALI with full_integration:

Chrysalis (@xylar)
- gnu and openmpi - seeing Linking errors in MALI build on Perlmutter #946 during MALI build
Perlmutter (@xylar)
- gnu and mpich - seeing Linking errors in MALI build on Perlmutter #946 during MALI build
- gnugpu and mpich - seeing Linking errors in MALI build on Perlmutter #946 during MALI build

Deployed

MALI with full_integration:

xylar · 2026-03-31T22:33:25Z

@matthewhoffman and @trhille, to test this for now, use:

./deploy.py --with-albany --deploy-spack --mache-fork xylar/mache --mache-branch update-to-3.3.0 ...

This branch is needed until I tag a 3.3.0rc2 for mache.

matthewhoffman · 2026-04-03T03:05:37Z

@xylar , can you walk me through a few more details about the transition to deploy.py?

First off, is the mache branch in your previous comment out of date? Mache branch update-to-3.3.0 doesn't exist on your mache fork. So I used fix-mache-deploy-with-mache-rc instead. I invoked it with:

./deploy.py --with-albany --deploy-spack --mache-fork xylar/mache --mache-branch fix-mache-deploy-with-mache-rc --compiler gnu --mpi mpich --machine pm-cpu

It ran great for awhile and seemed much faster than the old ./conda/configure-compass-env.py script. But after finishing the jigsaw build, it died with this error:

 Running:
   env -i bash -l /global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/spack/build_compass_albany_gnu_mpich.bash

fatal: detected dubious ownership in repository at '/global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0'
To add an exception for this directory, call:

	git config --global --add safe.directory /global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0
Traceback (most recent call last):
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/bin/mache", line 6, in <module>
    sys.exit(main())
             ~~~~^^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/__main__.py", line 21, in main
    args.func(args)
    ~~~~~~~~~^^^^^^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/cli.py", line 91, in _dispatch_deploy
    run_deploy(args=args)
    ~~~~~~~~~~^^^^^^^^^^^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/run.py", line 289, in run_deploy
    spack_results = deploy_spack_envs(
        ctx=ctx,
    ...<2 lines>...
        quiet=quiet,
    )
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/spack.py", line 374, in deploy_spack_envs
    _install_spack_env(
    ~~~~~~~~~~~~~~~~~~^
        ctx=ctx,
        ^^^^^^^^
    ...<10 lines>...
        quiet=quiet,
        ^^^^^^^^^^^^
    )
    ^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/spack.py", line 914, in _install_spack_env
    check_call(cmd, log_filename=log_filename, quiet=quiet)
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/bootstrap.py", line 222, in check_call
    raise subprocess.CalledProcessError(
        process.returncode, commands, output=stdout_data
    )
subprocess.CalledProcessError: Command 'env -i bash -l /global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/spack/build_compass_albany_gnu_mpich.bash' returned non-zero exit status 128.

ERROR: Deployment step failed (exit code 1). See the error output above.

Am I doing this wrong? Is this trying to deploy for the entire project? I don't think you want me interacting with /global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0 do you?

xylar · 2026-04-03T07:19:22Z

@matthewhoffman, I'm sorry. I'm developing mache for 3 projects at once -- E3SM-Unified, Polaris and Compass. that's a situation I usually try to avoid for precisely this type of reason.

I needed to release mache 3.3.0 for Polaris yesterday. As a result, the update-to-3.3.0 branch is gone. But I neglected to update this Compass branch until just now. At this point, no --mache-fork and --mache-branch should be needed for testing.

You also don't want to deploy spack. That was a mistake in my command above.

./deploy.py --with-albany --compiler gnu --mpi mpich --machine pm-cpu

matthewhoffman · 2026-04-03T14:34:02Z

Thanks, @xylar . I made a little more progress with the command you suggested. I had to make this change:

diff --git a/deploy/cli_spec.json b/deploy/cli_spec.json
index 56a951b1f..ebdea2c4c 100644
--- a/deploy/cli_spec.json
+++ b/deploy/cli_spec.json
@@ -1,7 +1,7 @@
 {
   "meta": {
     "software": "compass",
-    "mache_version": "3.3.0rc2",
+    "mache_version": "3.3.0",
     "description": "Deploy compass environment"
   },
   "arguments": [
diff --git a/deploy/pins.cfg b/deploy/pins.cfg
index bfe79a90e..6ca63db19 100644
--- a/deploy/pins.cfg
+++ b/deploy/pins.cfg
@@ -4,7 +4,7 @@ bootstrap_python = 3.14
 python = 3.14
 esmf = 8.9.1
 geometric_features = 1.6.1
-mache = 3.3.0rc2
+mache = 3.3.0
 mpas_tools = 1.4.0
 otps = 2021.10
 parallelio = 2.6.9

but then I still ran into an issue of it trying to touch the deployed spack env in the e3sm project space:

 Running:
   source /global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0/share/spack/setup-env.sh
   spack env activate compass_albany_gnu_mpich
   spack config add modules:prefix_inspections:lib:[LD_LIBRARY_PATH]
   spack config add modules:prefix_inspections:lib64:[LD_LIBRARY_PATH]

==> Error: cannot write to config file [Errno 13] Permission denied: '/global/cfs/cdirs/e3sm/software/compass/pm-cpu/spack/dev_compass_2.0.0/var/spack/environments/compass_albany_gnu_mpich/.spack.yaml.tmp'
Traceback (most recent call last):
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/hooks.py", line 103, in run_hook
    result = func(context)
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy/hooks.py", line 62, in post_spack
    _set_ld_library_path_for_spack_env(
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        ctx=ctx,
        ^^^^^^^^
        spack_path=spack_path,
        ^^^^^^^^^^^^^^^^^^^^^^
        env_name=env_name,
        ^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy/hooks.py", line 212, in _set_ld_library_path_for_spack_env
    check_call(
    ~~~~~~~~~~^
        commands,
        ^^^^^^^^^
        log_filename=_get_log_filename(ctx),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        quiet=bool(getattr(ctx.args, 'quiet', False)),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/global/cfs/cdirs/fanssie/users/hoffman2/compass/v2.0.0-alpha.1/deploy_tmp/bootstrap_pixi/.pixi/envs/default/lib/python3.14/site-packages/mache/deploy/bootstrap.py", line 222, in check_call
    raise subprocess.CalledProcessError(
        process.returncode, commands, output=stdout_data
    )

xylar · 2026-04-03T14:50:52Z

I had to make this change:

I think that's in 21d2713. Did you not have that commit or did I miss something?

xylar · 2026-04-03T14:51:51Z

but then I still ran into an issue of it trying to touch the deployed spack env in the e3sm project space:

Yep, that's something I need to fix. Sorry about that!

xylar · 2026-04-03T15:00:01Z

@matthewhoffman, the second issue should be fixed.

matthewhoffman · 2026-04-03T15:29:14Z

@xylar , thanks for addressing the second issue. The first must have been because I had failed to update my local branch this morning. After updating to 160d75d , ./deploy runs successfully and I'm able to load the compass env. I will move on to trying to build MALI next. One question - do you plan to add the version number back to the load_compass_pm-cpu_gnu_mpich.sh script that gets generated?

xylar · 2026-04-03T15:43:02Z

One question - do you plan to add the version number back to the load_compass_pm-cpu_gnu_mpich.sh script that gets generated?

The Compass version is in there:

export MACHE_DEPLOY_TARGET_VERSION="2.0.0-alpha.1"

It's just called something different than before. We can copy that into another environment variable if you need it.

xylar · 2026-04-03T15:43:21Z

Oh, wait, it already is:

export COMPASS_VERSION="2.0.0-alpha.1"

xylar · 2026-04-03T15:44:21Z

Are you not seeing that in you load script?

matthewhoffman · 2026-04-03T15:52:41Z

I just mean the name of the load script used to have the version in the filename, but I'm not seeing that. It's not a big deal, I was just wondering if that was intentional.

As for progress, when I compile MALI I am seeing the same PIO lib errors that you do in the issue you opened. I'm working on debugging them with help from ChatGPT and so far the obvious things are not working, but I'll keep at it while I have time.

xylar · 2026-04-03T16:16:10Z

I see. No, the load script won't include the compass version anymore. I didn't find that to be particularly useful.

matthewhoffman · 2026-04-03T16:33:37Z

I found it useful to know how old my load script was when I revisited a workdir I hadn't visited recently, but I could easily get that information from opening the load script. I have only very, very rarely needed to use an old load script older, so having the load script clobbered when the version changes isn't really a concern.

xylar · 2026-04-03T17:53:56Z

The way you things work now, you can't have more than one pixi environment in a given branch so it isn't really meaningful to have different versions of the load script for different versions.

You should consider using different worktrees for different versions.

This keeps ESMF from stepping on SCORPIO's toes by installing its own ParallelIO.

xylar · 2026-04-04T12:54:24Z

@matthewhoffman, I think the cleanest solution is the same as what Polaris does -- put ESMF in a software environment and SCORPIO in a library environment so they don't clobber each other. We only use ESMF as a binary so that works out.

This is no longer needed now that ESMF is in the software environment.

xylar · 2026-04-04T13:25:57Z

I'm regenerating the spack environments. I'll test this out on both Perlmutter and Chrysalis (the latter before it's too late!).

xylar · 2026-04-04T15:57:09Z

@matthewhoffman, I think the linking issue is likely fixed. I'm trying a rebuild of MALI now on Chrysalis. I'll try Perlmutter tomorrow. But looking good so far...

xylar · 2026-04-04T18:02:52Z

The Chrysalis and Perlmutter-CPU builds with gnu worked fine (I didn't try to run the test suite). I'm building with Perlmutter-GPU, gnugpu now...

matthewhoffman · 2026-04-04T20:39:15Z

@xylar , I was also able to build the conda env using the new deploy.py script on pm-cpu and successfully compile MALI at develop. When I run our full_integration suite, however, I see a strange mix of pass and fail. I expected the baseline comparison for runs with Albany to fail, but I'm also seeing some execution fails. There's a lot to sift through there, so I'm just going to post the full results in a collapsed section for now. I'm seeing a some PIO errors killing the model, and I'm also seeing Albany strangely abort after the solver completes. I think I'll need to debug all these failures before moving things forward.

@mperego , can you look at /pscratch/sd/h/hoffman2/COMPASS/TESTING/landice/dome/2000m/fo_decomposition_test/1proc_run/log.albany.0000.out and see if you can tell what happened to that run?

Full compass results:

full_integration results

Results at /pscratch/sd/h/hoffman2/COMPASS/TESTING

verifying deployed compass version...
  Verified version 2.0.0-alpha.1.
loading compute pixi env...
   pixi env loaded.
activating spack env...

Activating Modules:
  1) cray-netcdf-hdf5parallel/4.9.2.1

   spack env activated.
loading compass environment variables...
   compass environment variables loaded.
landice/dome/2000m/sia_restart_test
  * step: setup_mesh
  * step: full_run
  * step: restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
  test runtime:        00:24
landice/dome/2000m/sia_decomposition_test
  * step: setup_mesh
  * step: 1proc_run
  * step: 4proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
  test runtime:        00:06
landice/dome/variable_resolution/sia_restart_test
  * step: setup_mesh
  * step: full_run
  * step: restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
  test runtime:        00:07
landice/dome/variable_resolution/sia_decomposition_test
  * step: setup_mesh
  * step: 1proc_run
  * step: 4proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
  test runtime:        00:05
landice/enthalpy_benchmark/A
  * step: setup_mesh
  * step: phase1
  * step: phase2
  * step: phase3
  * step: visualize
  test execution:      SUCCESS
  baseline comparison: PASS
  test runtime:        00:23
landice/eismint2/decomposition_test
  * step: setup_mesh
  * step: 1proc_run
  * step: 4proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
  test runtime:        00:16
landice/eismint2/enthalpy_decomposition_test
  * step: setup_mesh
  * step: 1proc_run
  * step: 4proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
  test runtime:        00:18
landice/eismint2/restart_test
  * step: setup_mesh
  * step: full_run
  * step: restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
  test runtime:        00:21
landice/eismint2/enthalpy_restart_test
  * step: setup_mesh
  * step: full_run
  * step: restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: PASS
  test runtime:        00:17
landice/greenland/sia_restart_test
  * step: full_run
  * step: restart_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_greenland_sia_restart_test.log
  test runtime:        00:10
landice/greenland/sia_decomposition_test
  * step: 16proc_run
  * step: 32proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_greenland_sia_decomposition_test.log
  test runtime:        00:08
landice/hydro_radial/restart_test
  * step: setup_mesh
  * step: full_run
  * step: visualize_full_run
  * step: restart_run
  * step: visualize_restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_hydro_radial_restart_test.log
  test runtime:        00:41
landice/hydro_radial/decomposition_test
  * step: setup_mesh
  * step: 1proc_run
  * step: visualize_1proc_run
  * step: 3proc_run
  * step: visualize_3proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_hydro_radial_decomposition_test.log
  test runtime:        00:11
landice/humboldt/mesh-3km_decomposition_test/velo-none_calving-none_subglacialhydro
  * step: 16proc_run
  * step: 32proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro.log
  test runtime:        00:18
landice/humboldt/mesh-3km_restart_test/velo-none_calving-none_subglacialhydro
  * step: full_run
  * step: restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro.log
  test runtime:        00:48
landice/dome/2000m/fo_decomposition_test
  * step: setup_mesh
  * step: 1proc_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_dome_2000m_fo_decomposition_test.log
  test runtime:        00:11
landice/dome/2000m/fo_restart_test
  * step: setup_mesh
  * step: full_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_dome_2000m_fo_restart_test.log
  test runtime:        00:08
landice/dome/variable_resolution/fo_decomposition_test
  * step: setup_mesh
  * step: 1proc_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_dome_variable_resolution_fo_decomposition_test.log
  test runtime:        00:11
landice/dome/variable_resolution/fo_restart_test
  * step: setup_mesh
  * step: full_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_dome_variable_resolution_fo_restart_test.log
  test runtime:        00:10
landice/circular_shelf/decomposition_test
  * step: setup_mesh
  * step: 1proc_run
  * step: 4proc_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_circular_shelf_decomposition_test.log
  test runtime:        00:20
landice/greenland/fo_decomposition_test
  * step: 16proc_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_greenland_fo_decomposition_test.log
  test runtime:        00:13
landice/greenland/fo_restart_test
  * step: full_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_greenland_fo_restart_test.log
  test runtime:        00:12
landice/thwaites/fo_decomposition_test
  * step: 16proc_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_thwaites_fo_decomposition_test.log
  test runtime:        00:14
landice/thwaites/fo_restart_test
  * step: full_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_thwaites_fo_restart_test.log
  test runtime:        00:12
landice/thwaites/fo-depthInt_decomposition_test
  * step: 16proc_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_thwaites_fo-depthInt_decomposition_test.log
  test runtime:        00:11
landice/thwaites/fo-depthInt_restart_test
  * step: full_run
      Failed
  test execution:      ERROR
  see: case_outputs/landice_thwaites_fo-depthInt_restart_test.log
  test runtime:        00:13
landice/humboldt/mesh-3km_restart_test/velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
  * step: full_run
  * step: restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting.log
  test runtime:        00:49
landice/humboldt/mesh-3km_restart_test/velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting
  * step: full_run
  * step: restart_run
  test execution:      SUCCESS
  test validation:     PASS
  baseline comparison: FAIL
  see: case_outputs/landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting.log
  test runtime:        00:20
Test Runtimes:
00:24 PASS landice_dome_2000m_sia_restart_test
00:06 PASS landice_dome_2000m_sia_decomposition_test
00:07 PASS landice_dome_variable_resolution_sia_restart_test
00:05 PASS landice_dome_variable_resolution_sia_decomposition_test
00:23 PASS landice_enthalpy_benchmark_A
00:16 PASS landice_eismint2_decomposition_test
00:18 PASS landice_eismint2_enthalpy_decomposition_test
00:21 PASS landice_eismint2_restart_test
00:17 PASS landice_eismint2_enthalpy_restart_test
00:10 FAIL landice_greenland_sia_restart_test
00:08 FAIL landice_greenland_sia_decomposition_test
00:41 FAIL landice_hydro_radial_restart_test
00:11 FAIL landice_hydro_radial_decomposition_test
00:18 FAIL landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro
00:48 FAIL landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro
00:11 FAIL landice_dome_2000m_fo_decomposition_test
00:08 FAIL landice_dome_2000m_fo_restart_test
00:11 FAIL landice_dome_variable_resolution_fo_decomposition_test
00:10 FAIL landice_dome_variable_resolution_fo_restart_test
00:20 FAIL landice_circular_shelf_decomposition_test
00:13 FAIL landice_greenland_fo_decomposition_test
00:12 FAIL landice_greenland_fo_restart_test
00:14 FAIL landice_thwaites_fo_decomposition_test
00:12 FAIL landice_thwaites_fo_restart_test
00:11 FAIL landice_thwaites_fo-depthInt_decomposition_test
00:13 FAIL landice_thwaites_fo-depthInt_restart_test
00:49 FAIL landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting
00:20 FAIL landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting
Total runtime 08:18
FAIL: 19 tests failed, see above.

mperego · 2026-04-06T13:52:33Z

Hi Matt, Thanks for trying it out. The test you pointed me to, fails for a floating point exception. It would be interesting to know if it would pass if you don't check for FPEs. Anyway, I can try to reproduce it in the next days and see if we can fix it. Mauro

…

________________________________ From: Matt Hoffman ***@***.***> Sent: Saturday, April 4, 2026 2:39 PM To: MPAS-Dev/compass ***@***.***> Cc: Perego, Mauro ***@***.***>; Mention ***@***.***> Subject: [EXTERNAL] Re: [MPAS-Dev/compass] Update to v2.0.0-alpha.1 (PR #944) [https://avatars.githubusercontent.com/u/4182034?s=20&v=4]matthewhoffman left a comment (MPAS-Dev/compass#944)<#944 (comment)> @xylar<https://github.com/xylar> , I was also able to build the conda env using the new deploy.py script on pm-cpu and successfully compile MALI at develop. When I run our full_integration suite, however, I see a strange mix of pass and fail. I expected the baseline comparison for runs with Albany to fail, but I'm also seeing some execution fails. There's a lot to sift through there, so I'm just going to post the full results in a collapsed section for now. I'm seeing a some PIO errors killing the model, and I'm also seeing Albany strangely abort after the solver completes. I think I'll need to debug all these failures before moving things forward. @mperego<https://github.com/mperego> , can you look at /pscratch/sd/h/hoffman2/COMPASS/TESTING/landice/dome/2000m/fo_decomposition_test/1proc_run/log.albany.0000.out and see if you can tell what happened to that run? Full compass results: full_integration results Results at /pscratch/sd/h/hoffman2/COMPASS/TESTING verifying deployed compass version... Verified version 2.0.0-alpha.1. loading compute pixi env... pixi env loaded. activating spack env... Activating Modules: 1) cray-netcdf-hdf5parallel/4.9.2.1 spack env activated. loading compass environment variables... compass environment variables loaded. landice/dome/2000m/sia_restart_test * step: setup_mesh * step: full_run * step: restart_run test execution: SUCCESS test validation: PASS baseline comparison: PASS test runtime: 00:24 landice/dome/2000m/sia_decomposition_test * step: setup_mesh * step: 1proc_run * step: 4proc_run test execution: SUCCESS test validation: PASS baseline comparison: PASS test runtime: 00:06 landice/dome/variable_resolution/sia_restart_test * step: setup_mesh * step: full_run * step: restart_run test execution: SUCCESS test validation: PASS baseline comparison: PASS test runtime: 00:07 landice/dome/variable_resolution/sia_decomposition_test * step: setup_mesh * step: 1proc_run * step: 4proc_run test execution: SUCCESS test validation: PASS baseline comparison: PASS test runtime: 00:05 landice/enthalpy_benchmark/A * step: setup_mesh * step: phase1 * step: phase2 * step: phase3 * step: visualize test execution: SUCCESS baseline comparison: PASS test runtime: 00:23 landice/eismint2/decomposition_test * step: setup_mesh * step: 1proc_run * step: 4proc_run test execution: SUCCESS test validation: PASS baseline comparison: PASS test runtime: 00:16 landice/eismint2/enthalpy_decomposition_test * step: setup_mesh * step: 1proc_run * step: 4proc_run test execution: SUCCESS test validation: PASS baseline comparison: PASS test runtime: 00:18 landice/eismint2/restart_test * step: setup_mesh * step: full_run * step: restart_run test execution: SUCCESS test validation: PASS baseline comparison: PASS test runtime: 00:21 landice/eismint2/enthalpy_restart_test * step: setup_mesh * step: full_run * step: restart_run test execution: SUCCESS test validation: PASS baseline comparison: PASS test runtime: 00:17 landice/greenland/sia_restart_test * step: full_run * step: restart_run Failed test execution: ERROR see: case_outputs/landice_greenland_sia_restart_test.log test runtime: 00:10 landice/greenland/sia_decomposition_test * step: 16proc_run * step: 32proc_run test execution: SUCCESS test validation: PASS baseline comparison: FAIL see: case_outputs/landice_greenland_sia_decomposition_test.log test runtime: 00:08 landice/hydro_radial/restart_test * step: setup_mesh * step: full_run * step: visualize_full_run * step: restart_run * step: visualize_restart_run test execution: SUCCESS test validation: PASS baseline comparison: FAIL see: case_outputs/landice_hydro_radial_restart_test.log test runtime: 00:41 landice/hydro_radial/decomposition_test * step: setup_mesh * step: 1proc_run * step: visualize_1proc_run * step: 3proc_run * step: visualize_3proc_run test execution: SUCCESS test validation: PASS baseline comparison: FAIL see: case_outputs/landice_hydro_radial_decomposition_test.log test runtime: 00:11 landice/humboldt/mesh-3km_decomposition_test/velo-none_calving-none_subglacialhydro * step: 16proc_run * step: 32proc_run test execution: SUCCESS test validation: PASS baseline comparison: FAIL see: case_outputs/landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro.log test runtime: 00:18 landice/humboldt/mesh-3km_restart_test/velo-none_calving-none_subglacialhydro * step: full_run * step: restart_run test execution: SUCCESS test validation: PASS baseline comparison: FAIL see: case_outputs/landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro.log test runtime: 00:48 landice/dome/2000m/fo_decomposition_test * step: setup_mesh * step: 1proc_run Failed test execution: ERROR see: case_outputs/landice_dome_2000m_fo_decomposition_test.log test runtime: 00:11 landice/dome/2000m/fo_restart_test * step: setup_mesh * step: full_run Failed test execution: ERROR see: case_outputs/landice_dome_2000m_fo_restart_test.log test runtime: 00:08 landice/dome/variable_resolution/fo_decomposition_test * step: setup_mesh * step: 1proc_run Failed test execution: ERROR see: case_outputs/landice_dome_variable_resolution_fo_decomposition_test.log test runtime: 00:11 landice/dome/variable_resolution/fo_restart_test * step: setup_mesh * step: full_run Failed test execution: ERROR see: case_outputs/landice_dome_variable_resolution_fo_restart_test.log test runtime: 00:10 landice/circular_shelf/decomposition_test * step: setup_mesh * step: 1proc_run * step: 4proc_run test execution: SUCCESS test validation: PASS baseline comparison: FAIL see: case_outputs/landice_circular_shelf_decomposition_test.log test runtime: 00:20 landice/greenland/fo_decomposition_test * step: 16proc_run Failed test execution: ERROR see: case_outputs/landice_greenland_fo_decomposition_test.log test runtime: 00:13 landice/greenland/fo_restart_test * step: full_run Failed test execution: ERROR see: case_outputs/landice_greenland_fo_restart_test.log test runtime: 00:12 landice/thwaites/fo_decomposition_test * step: 16proc_run Failed test execution: ERROR see: case_outputs/landice_thwaites_fo_decomposition_test.log test runtime: 00:14 landice/thwaites/fo_restart_test * step: full_run Failed test execution: ERROR see: case_outputs/landice_thwaites_fo_restart_test.log test runtime: 00:12 landice/thwaites/fo-depthInt_decomposition_test * step: 16proc_run Failed test execution: ERROR see: case_outputs/landice_thwaites_fo-depthInt_decomposition_test.log test runtime: 00:11 landice/thwaites/fo-depthInt_restart_test * step: full_run Failed test execution: ERROR see: case_outputs/landice_thwaites_fo-depthInt_restart_test.log test runtime: 00:13 landice/humboldt/mesh-3km_restart_test/velo-fo_calving-von_mises_stress_damage-threshold_faceMelting * step: full_run * step: restart_run test execution: SUCCESS test validation: PASS baseline comparison: FAIL see: case_outputs/landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting.log test runtime: 00:49 landice/humboldt/mesh-3km_restart_test/velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting * step: full_run * step: restart_run test execution: SUCCESS test validation: PASS baseline comparison: FAIL see: case_outputs/landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting.log test runtime: 00:20 Test Runtimes: 00:24 PASS landice_dome_2000m_sia_restart_test 00:06 PASS landice_dome_2000m_sia_decomposition_test 00:07 PASS landice_dome_variable_resolution_sia_restart_test 00:05 PASS landice_dome_variable_resolution_sia_decomposition_test 00:23 PASS landice_enthalpy_benchmark_A 00:16 PASS landice_eismint2_decomposition_test 00:18 PASS landice_eismint2_enthalpy_decomposition_test 00:21 PASS landice_eismint2_restart_test 00:17 PASS landice_eismint2_enthalpy_restart_test 00:10 FAIL landice_greenland_sia_restart_test 00:08 FAIL landice_greenland_sia_decomposition_test 00:41 FAIL landice_hydro_radial_restart_test 00:11 FAIL landice_hydro_radial_decomposition_test 00:18 FAIL landice_humboldt_mesh-3km_decomposition_test_velo-none_calving-none_subglacialhydro 00:48 FAIL landice_humboldt_mesh-3km_restart_test_velo-none_calving-none_subglacialhydro 00:11 FAIL landice_dome_2000m_fo_decomposition_test 00:08 FAIL landice_dome_2000m_fo_restart_test 00:11 FAIL landice_dome_variable_resolution_fo_decomposition_test 00:10 FAIL landice_dome_variable_resolution_fo_restart_test 00:20 FAIL landice_circular_shelf_decomposition_test 00:13 FAIL landice_greenland_fo_decomposition_test 00:12 FAIL landice_greenland_fo_restart_test 00:14 FAIL landice_thwaites_fo_decomposition_test 00:12 FAIL landice_thwaites_fo_restart_test 00:11 FAIL landice_thwaites_fo-depthInt_decomposition_test 00:13 FAIL landice_thwaites_fo-depthInt_restart_test 00:49 FAIL landice_humboldt_mesh-3km_restart_test_velo-fo_calving-von_mises_stress_damage-threshold_faceMelting 00:20 FAIL landice_humboldt_mesh-3km_restart_test_velo-fo-depthInt_calving-von_mises_stress_damage-threshold_faceMelting Total runtime 08:18 FAIL: 19 tests failed, see above. — Reply to this email directly, view it on GitHub<#944 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABNKKZB5JRHIDWT3BG7OVZL4UFXITAVCNFSM6AAAAACWZ7OQGCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DCOBXG4YTIMRYGI>. You are receiving this because you were mentioned.Message ID: ***@***.***>

matthewhoffman · 2026-04-06T17:51:02Z

@mperego , I recompiled with DEBUG=false to avoid checking for FPE and then the execution for the landice/dome/2000m/fo_decomposition_test/ test succeeds without error and the decomp validation passes. (The baseline comparison fails, but I expect that the Albany update changes answers slightly, so that's expected.)

Do you have an idea of how to find the FPE to fix it? Or do you want me to try to track anything further down on my end? Assuming you're able to track it down, it sounds like we'll want to update spack to use that updated version of Albany in this PR (cc: @xylar ).

xylar force-pushed the switch-to-mache-deploy branch from 9c93e54 to 0303b90 Compare March 21, 2026 13:14

xylar force-pushed the switch-to-mache-deploy branch from 0303b90 to ba7c900 Compare March 21, 2026 13:41

xylar mentioned this pull request Mar 21, 2026

Add optional CLI for downstream software in mache.deploy E3SM-Project/mache#365

Merged

5 tasks

xylar force-pushed the switch-to-mache-deploy branch from 74bb269 to cec99ef Compare March 21, 2026 16:07

This was referenced Mar 21, 2026

Errors when trying to build MALI on Perlmutter-CPU with Gnu #945

Closed

Add a mechanism to exclude packages from spack yaml/shell E3SM-Project/mache#366

Merged

Linking errors in MALI build on Perlmutter #946

Open

xylar force-pushed the switch-to-mache-deploy branch 2 times, most recently from fd971e0 to 7f43434 Compare March 31, 2026 17:58

xylar added 11 commits April 3, 2026 09:54

Update to v2.0.0-alpha.1

8cae87a

Switch to pyproject.toml

ba7dce3

Move deployment to deploy.py using mache.deploy

8e67e7c

Update dependencies

64cf3a5

Remove compass.load and use mache.deploy instead

95e5553

Update provenance to use pixi

44ffe5a

Add ignores related to mache.deploy

c2fbce6

Update CI for mache.deploy

0e2267e

Update the docs following move to mache.deploy

c06dbb2

Fix machine name detection in setup.py

915f52f

Remove configs for unsupported machines

e8d642e

xylar added 6 commits April 3, 2026 09:54

Add spack libs to LD_LIBRARY_PATH in post_spack() hook

f66fdd2

Update to albany tag compass-2026-03-21

0d8d4fb

Update to mache 3.2.0

36b42fc

Update to mache 3.3.0rc2

1ae76f2

Remove +uvm from trilinos and albany spack variants

071b895

update to mache 3.3.0

21d2713

xylar force-pushed the switch-to-mache-deploy branch from d3cc399 to 21d2713 Compare April 3, 2026 07:54

Fix post_spack to only run with --deploy-spack

160d75d

Switch to separate software and lib spack environments

fad7b06

This keeps ESMF from stepping on SCORPIO's toes by installing its own ParallelIO.

Don't remove ESMF libraries in post_spack hook

f762594

This is no longer needed now that ESMF is in the software environment.

Add groups for updating permissions

069483c

Conversation

xylar commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Updates:

Testing

Deployed

Uh oh!

xylar commented Mar 31, 2026

Uh oh!

matthewhoffman commented Apr 3, 2026

Uh oh!

xylar commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

matthewhoffman commented Apr 3, 2026

Uh oh!

xylar commented Apr 3, 2026

Uh oh!

xylar commented Apr 3, 2026

Uh oh!

xylar commented Apr 3, 2026

Uh oh!

matthewhoffman commented Apr 3, 2026

Uh oh!

xylar commented Apr 3, 2026

Uh oh!

xylar commented Apr 3, 2026

Uh oh!

xylar commented Apr 3, 2026

Uh oh!

matthewhoffman commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xylar commented Apr 3, 2026

Uh oh!

matthewhoffman commented Apr 3, 2026

Uh oh!

xylar commented Apr 3, 2026

Uh oh!

xylar commented Apr 4, 2026

Uh oh!

xylar commented Apr 4, 2026

Uh oh!

xylar commented Apr 4, 2026

Uh oh!

xylar commented Apr 4, 2026

Uh oh!

matthewhoffman commented Apr 4, 2026

Full compass results:

Uh oh!

mperego commented Apr 6, 2026 via email

Uh oh!

matthewhoffman commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xylar commented Mar 21, 2026 •

edited

Loading

xylar commented Apr 3, 2026 •

edited

Loading

matthewhoffman commented Apr 3, 2026 •

edited

Loading